A comparative study of single-channel signal processing methods in fetal phonocardiography

Fetal phonocardiography is a non-invasive, completely passive and low-cost method based on sensing acoustic signals from the maternal abdomen. However, different types of interference are sensed along with the desired fetal phonocardiography. This study focuses on the comparison of fetal phonocardiography filtering using eight algorithms: Savitzky-Golay filter, finite impulse response filter, adaptive wavelet transform, maximal overlap discrete wavelet transform, variational mode decomposition, empirical mode decomposition, ensemble empirical mode decomposition, and complete ensemble empirical mode decomposition with adaptive noise. The effectiveness of those methods was tested on four types of interference (maternal sounds, movement artifacts, Gaussian noise, and ambient noise) and eleven combinations of these disturbances. The dataset was created using two synthetic records r01 and r02, where the record r02 was loaded with higher levels of interference than the record r01. The evaluation was performed using the objective parameters such as accuracy of the detection of S1 and S2 sounds, signal-to-noise ratio improvement, and mean error of heart interval measurement. According to all parameters, the best results were achieved using the complete ensemble empirical mode decomposition with adaptive noise method with average values of accuracy = 91.53% in the detection of S1 and accuracy = 68.89% in the detection of S2. The average value of signal-to-noise ratio improvement achieved by complete ensemble empirical mode decomposition with adaptive noise method was 9.75 dB and the average value of the mean error of heart interval measurement was 3.27 ms.


Introduction
Fetal phonocardiography (fPCG) is a method based on sensing the acoustic signals of the fetal heart from the maternal abdomen providing valuable information about the fetal well-being [1]. The fetal heart sounds (fHSs) were first mentioned in 1650, however, were not officially noise [3]. To remove these interfering signals and thus extract a high-quality signal providing clinically valuable information, it is necessary to choose a suitable filtering algorithm. The aim of this study is to compare the performance of eight algorithms: Savitzky-Golay (S-G) filter, finite impulse response (FIR) filter, adaptive wavelet transform (AWT), maximal overlap discrete wavelet transform (MODWT), variational mode decomposition (VMD), empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). It is important to note that some of the algorithms (e.g. MODWT or CEEMDAN) have not yet been tested and published for the fPCG extraction. Moreover, the effectiveness of these methods in filtering different types of interference was objectively evaluated using several metrics (e.g. accuracy of S1 and S2 detection, SNR improvement and jDT i j parameter). The interferences tested included, for example, mHSs, movement artifacts, Gaussian noise, ambient noise, and eleven combinations of these disturbances to best simulate the states that occur in clinical practice. The use of a relatively large number of algorithms, noise scenarios, and evaluation metrics make this study unique and comprehensive. In particular, the evaluation of the accuracy of S2 detection, which is a clinically valuable feature, is quite rare except for very few studies focus on this topic, e.g. [16][17][18].

State-of-the-Art
Many authors [3,13,15,[19][20][21][22] have looked into the design and testing of algorithms for fPCG filtration. As well as the filtering itself, some studies [3,5,15,23,24] were aimed at detecting S1 sounds, and only a few authors [17,18] looked into detecting S1 and S2 sounds. The ideal extraction algorithm should both suppress disruptive signals and preserve fPCG morphology so that clinically important information is not lost. A summary of the fPCG signal processing methods will be provided in this section and in Table 1.
• Wavelet transform (WT) was proposed for extraction of fPCG in [13]. The authors did not deal with detection of fHSs, but evaluated the effectiveness of the method only according to signal-to-noise ratio (SNR). The method was tested on 37 synthetic signals, and the best results were achieved with wavelet coif4 and seven levels of decomposition.
• A comparison study of the WT method was carried out by the authors in [15]. A total of 18 WT-based filters were tested for fPCG extraction. S1 sounds were automatically detected by a PCG-Delineator, which is the threshold-based application. The filters were tested on 37 synthetic records and 119 real ones. Evaluation was based on the accuracy of determining fHR and SNR. And the best results were achieved using wavelet coif4, and universal soft thresholding.
• The WT method was also tested in [19]. The authors proposed a new wavelet basis function, designed especially for filtering of fPCG. Fetal wavelet basis function with the threshold rigrsure achieved better results based on mean squared error (MSE) than classic wavelets db5, coif4 and sym7 with higher convergence speeds.
• The authors in [18] used adaptive WT (AWT) for filtering of fPCG. The most effective filtering was achieved with wavelet coif2 and six levels of decomposition. Identification of S1 and S2 was based on time intervals between the peaks and their correspondence to physiological values. The method was tested on 14 women between the 36th and 40th week of pregnancy. Evaluation of the perormance of the method was carried out by comparing fHR plots with Doppler ultrasound monitor, and accuracy of 94-97.5% was achieved.
• In [23] a bandpass filter (BPF) with a frequency band of 25-100 Hz was used. For detection of S1 sounds autocorrelation was used as the dominant method, which proved to be very effective for sections with a low level of interference. If, however, this method was not sufficiently precise for sections with higher levels of interference, a further two methods were used for these sections: WT and matching pursuit (MP). The method was tested on 25 real recordings sensed from the abdominal area of pregnant women in the 34th week of pregnancy. This combined approach achieved accuracy in detection of S1 sounds from 92.9% to 98.5%.
• The authors of study [17] created an iterative algorithm combining the WT method and fractal dimension (FD). The WT method was used for removing disturbances from the fPCG signal using wavelet db4. The FD method was used for detection of all fHSs. Finally, differentiation between S1 and S2 sounds was carried out, based on the fact that diastolic duration is longer than systolic duration. During testing on 19 synthetic recordings, overall accuracy in detection of fHSs of 89% was achieved.

Author, source Noise removal Feature extraction Results
Sbrollini et al. [13] WT -The best results were obtained with coif4 and 7 levels of decomposition Tomassini et al. [15] WT S1 detection was performed using threshold-based application (PCG-Delineator) The best results were obtained with coif4, and universal soft thresholding Chourasia et al. [19] WT -The best results were obtained with new 'fetal' avelet basis function Vaisman et al. [18] AWT S1 and S2 identification was based on time intervals between the peaks and their correspondence to physiological values The best results were obtained with coif4 and 7 levels of decomposition Accuracy in determining the fHR was 94-97.5% Kovacs et al. [23] BPF S1 detection was based on combination of autocorrelation, WT and MP The optimal BPF filter Hz band was 25-100 Accuracy in S1 detection was 92.9-98.5% Koutsiana et al. [17] WT fHSs detection was based on FD and S1 and S2 identification was based on physiological values of cardiac cycle The best results were obtained with db4 Accuracy in S1 and S2 detection and identification was 89% Martinek et al. [3] EMD EEMD AWT S1 detection was based on Pan-Tompkins algorithm Accuracy in S1 detection according to The optimal BPF filter band was 20-200 Hz Accuracy in determining the fHR was 84-91% Huimin et al. [27] EMD-LWT A combination of HT and cepstrum was used to determine fHR The fHR determination was accurate Cesarelli et al. [5] BPF S1 detection was performed using Teager energy operator and logic block based on amplitude thresholding The optimal BPF filter band was 34-54 Hz Accuracy in determining the fHR was 68-99% Samieinasab et al. [22] EMD-NMF-Clustering -Accuracy in determining the fHR was 83-100% Zahorian et al. [28] FIR-Matched filter A combination of Teager energy operator and autocorrelation was used to determine fHR The fHR determination was accurate Ruffo et al. [24] BPF-Matched filter S1 detection was performed using Teager energy, autocorrelation and amplitude thresholding The fHR values were very close to the reference values https://doi.org/10.1371/journal.pone.0269884.t001

PLOS ONE
• AWT methods, empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD) were presented in [3] for fPCG filtration. The Pan-Tompkins algorithm was used for detection of S1 sounds. The method was tested on 12 synthetic recordings, which were distorted by three types of disturbances (ambient noise, Gaussian noise and movement artifacts of the mother and the fetus). Evaluation of the effectiveness of the method was carried out using SNR improvement, mean error of heart interval measurement, fHR, and evaluation of detection of S1 sounds was carried out using statistical parameters: accuracy (ACC), sensitivity (SE), positive predictive value (PPV), and harmonic mean between SE and PPV (F1). According to the ACC parameter, the best results were achieved with AWT in a range of 97.37-100%.
• A single-channel independent component analysis (SCICA) was tested in [20]. First, an appropriate matrix of delays was created, then a multiple FastICA was applied. The method was tested on three real recordings, and the gestation age of the fetuses was 36-40 weeks. The authors did not present any statistical results, but they observed that after filtering of the signals using the given method, S1 and S2 sounds were clearly identifiable.
• Filtering of disturbances with an eigen filter based subspace separation technique with a Wiener filter was presented in [25]. As well as extraction of the fPCG signal the authors also looked into detection of abnormalities (mitral stenosis). An eigenvector based subspace matching system was used for detection of abnormalities. Synthetically generated mitral stenosis was successfully identified with the help of the designed algorithm.
• A single-channel method combining the EMD method, singular value decomposition (SVD) and an efficient version of ICA (EFICA) was proposed in [21]. A combination of all methods was tested on real recordings and even led to effective extraction of signals burdened with high levels of interference. Although the authors did not publish statistical results, they observed that S1 and S2 sounds could clearly be identified.
• The authors in [26] used BPF with a frequency band of 20-200 Hz for filtering of fPCG. Spectograms were then created with the help of short-time Fourier transform (STFT). Finally, the non-negative matrix factorization (NMF) method was used for analysis of the signal and determination of fHR. The authors used real recordings from women in the 38th and 39th week of pregnancy. In addition to fPCG signals, CTG recordings were also made, which served as a reference. The effectiveness of the method was evaluated according to its accuracy in determining fHR with regard to the reference, and accuracy of 84-91% was achieved.
• In [27] a combination of the EMD and and lifting wavelet transform (LWT) methods was used for eliminating interference in the fPCG signal. Subsequently a spectrum of the signal envelope was obtained using Hilbert transform (HT), and the resulting fHR values were obtained using the cepstrum method. The method was tested on 20 real recordings obtained from women between the 30th and 40th week of pregnancy. The authors did not publish statistical results-they only observed that the determined fHR value was accurate.
• The authors in [5] used BPF with a frequency band of 34-54 Hz for filtering of fPCG. Nonlinear time Teager energy operator, which identified high-energy peaks and enhanced S1 sounds, was then applied. Finally, a logic block based on amplitude thresholding was used for detection of S1 sounds. During testing on synthetic data, accuracy of 68-99% was achieved in detection of fHR according to ACC parameters.
• In [22] a single-channel method for extraction of fPCG combining EMD, NMF and clustering methods was proposed. The method was tested on 50 real recordings and simultaneously measured CTG was used as a reference. Accuracy of the algorithm in determining fHR was in relation to the reference 83-100%.
• The device for monitoring fHR proposed in [28] used a BPF FIR filter with an order of 124 and a matched filter for filtering of fPCG signals. For determining fHR, a Teager energy operator was applied, which enhanced the positions with fetal heart beats. Finally, autocorrelation was used, which served to detect periodical components and determine fHR. The method was tested on 12 real recordings. The authors did not publish statistical results, but they concluded that this method is effective for determining fHR.
• For filtering of fPCG signals in [24] a combination of matched filter and BPF with a frequency range of 34-54 Hz was used. S1 sounds were then enhanced using a Teager energy operator and detected with the help of autocorrelation and amplitude thresholding. The method was tested on real recordings obtained from women between the 30th and 40th week of pregnancy. The accuracy of the method was evaluated based on determining fHR with regard to reference values of fHR obtained from CTG, which was recorded together with fPCG. The authors concluded that fHR values determined using this method were very close to the reference values determined from CTG.
From the above it emerges that objective comparison of testing method performance is problematic, because authors use different signals (real or synthetic) disturbed by various levels and types of interference. Some authors [3,5,17,18,22,23,26] then evaluate the effectiveness of filtering using objective statistical parameters and some [20,21,27] only subjectively evaluate the extracted waveform. The aim of this study is to carry out an objective and uniform comparison of eight algorithms for filtering of fPCG for various types and levels of disturbance and evaluate their effectiveness using statistical parameters. This comparative study could therefore help find the optimal algorithm for processing fPCG, which could be implemented in devices for home monitoring and analysis of the heart activity of fetuses.

Materials and methods
On the basis of in-depth research eight algorithms (S-G filter, FIR filter, AWT, MODWT, VMD, EMD, EEMD, and CEEMDAN) were chosen for filtering fPCG which have the potential to effectively filter interference. This section also includes a description of the reference signals and disturbances which were generated for testing these algorithms. The evaluation parameters which were used for evaluating the quality of filtering and accuracy of detection of S1 and S2 sounds are also described.

Filtration algorithms
This subsection summarises the basic information about algorithms. As the majority of the algorithms are very well described in the literature, only the basic facts are given here. For each method the literature is cited, where extra information can be found.
• Savitzky-Golay filter-polynomial S-G filter is a widely used method for smoothing and differentiating time series, and also biomedical data [29]. The technique is based on least squares fitting of a lower order polynomial to a number of consecutive points [30]. The aim of filtering using S-G is to find co-efficients that increase the accuracy of data, and also maintain the trend of the given signal [29]. To achieve good results, it is necessary to find a compromise when choosing the length of the window and the polynomial order for the tested data. A detailed description of this technique can be obtained in [29][30][31]. S-G was used in [32] as part of the filtering algorithm for fPCG and for processing of adult PCG in [33,34].
• Finite impulse response filter-the non-recursive FIR filter can also be categorised as one of the frequently used filters for processing biomedical signals [35]. This is a filter whose impulse response has finite length. The advantage of the filter is its stability and linear phase response, where there is the same delay in harmonic sections with no phase distortion [35]. For correct functioning of the FIR filter it is necessary to choose an appropriate filter length and cut-off frequency. Further information can be found in [31,35,36]. The FIR filter was tested for filtering of fPCG in [28] and for filtering of adult PCG in [37].
• Adaptive wavelet transform-methods based on WT are among the most frequently used techniques for processing non-stationary signals, and thus also for filtering of fPCG. The advantage of the method is the representation of the processed signal both in a time and frequency domain [3,19]. The first step is the decomposition of the input signal, when co-efficients are obtained. In the case of AWT, this is followed by adaptive thresholding of these co-efficients. Each co-efficient is assigned a certain threshold value, which corresponds to the changes in interference output in the signal (this is achieved using a moving window) [3]. Inverse WT is applied for reconstruction of the filtered signal. In order for filtering of the signal to be effective, the appropriate type and width of wavelet, and the appropriate number of decomposition levels must be chosen. More information about the method can be found in [38]. WT was tested for the purposes of fPCG filtering in [13,15,19], and speciically the AWT method in [3,18].
• Maximal overlap discrete wavelet transform-the MODWT method can also be placed in the WT family (also known as undecimated discrete wavelet transform), which is based on the principle of leaving out the down-sampling process [39]. The wavelet co-efficients therefore have the same length as the input signal at each level, and offer better approximisation. Inverse WT and thresholding follow [40]. Again the choice of the type of wavelet, wavelet length, and the number of decomposition levels for the given type of signal play an important role. Further information can be obtained in [39,40]. For processing adult PCG signals the method in [41] was used.
• Variational mode decomposition-the VMD method is a relatively new quasi-orthogonal technique based on the decomposition of the input signal into intrinsic mode functions (IMFs). These IMFs represent a separated frequency band of the processed signal [42,43]. The method uses a calculation of a one-way frequency spectrum using HT and the shift of individual modes to baseband. The width of the band of each mode is estimated using Dirichlet energy of the demodulated signal [44]. VMD is an alternative to the EMD method, however in contrast to EMD individual IMFs are extracted simultaneously and non-recursively [42]. More detailed information can be found in [42][43][44]. For the purposes of processing fPCG the method in [45] was used, and for processing adult PCG the method in [44] was introduced.
• Empirical mode decomposition-EMD is a filtering technique appropriate for processing non-stationary and non-linear signals. As in the VMD method, the input signal is decomposed into internal oscillatory functions-IMFs, which represent a specific frequency band [21]. The principle of the method is based on the detection of upper and lower envelope of the signal by detecting the local maxima and minima. The mean of envelopes is then calculated and subtracted from the input signal. The resulting signal is denoted as IMF1 if it fulfills the conditions for IMFs. Further IMFs are extracted by repeating the whole procedure, however instead of the input signal, residue is used, which is created by subtracting IMF1 from the input signal [21,46,47]. The effectiveness of the EMD method is lowered by the mode mixing problem, where one IMF covers multiple components with different frequencies [48]. Further information can be obtained in [3,21,[46][47][48]. The method was tested for processing fPCG in [3,21,22,27] and for processing adult PCG in [49,50].
• Ensemble empirical mode decomposition-the EEMD method was proposed in order to overcome the limitations of the EMD method, and resulted in more effective filtering of signals. EEMD works on the principle of adding white noise to the input signal and carrying out a pre-chosen number of EMD cycles [3,51]. Individual IMFs, which are created by averaging the results of all EMD cycles, are the output of the algorithm [3,52]. The disadvantage of the EEMD method is its low computational speed. A detailed description is presented in [3,51,52]. The EEMD method was used for processing fPCG in [3], and for processing adult PCG in [53,54].
• Complete ensemble empirical mode decomposition with adaptive noise-the CEEMDAN method was designed with the aim of overcoming the limitations of the EEMD method. CEEMDAN works on the same principle as EEMD with small differences [55]. Paired positive and negative adaptive white noise is added to the input signal, which is able to contribute more to elimination of mode mixing. The predetermined number of EMD cycles is then carried out and the resulting IMFs are determined by averaging the outputs of all EMD cycles [55,56]. The resulting IMFs are however counted sequentially, which leads to an increase in the computational speed of the algorithm. More information about the method can be found in [55,56]. Processing of adult PCG using CEEMDAN was carried out in [54,57].

Fetal heart sounds detection
Detection of S1 and S2 sounds was inspired by study [58]. The principle of the method is based on a combination of HT, and threshold and deciding factors. First, the envelope of the input signal was detected using HT. The signal envelope contained residues of interference, so it was smoothed out using a low-pass filter (LPF). All peaks were then detected, and potential S1 and S2 sounds were found. Peaks which were above the threshold value, which was set as 0.4 times the maximal amplitude of the envelope, were labelled as potential S1 and S2 sounds. Other peaks were excluded. In order to avoid detection of extra peaks, a principle was established for their elimination, through setting a minimum time interval between peaks of 100 ms. If the time interval was shorter, an extra peak was searched for between peaks. If more such peaks were found in this interval, the peak with the highest amplitude was preserved, and lower peaks were excluded.
Due to the variability of fHSs amplitude, it was also necessary to deal with the possibility that some of the fHSs appeared under the set threshold and could therefore affect subsequent classification. After excluding extra peaks, the shortest interval between two fHSs was chosen and a limit with a value twice as large as that interval was defined. If a time interval with a value larger than the assigned limit was then detected, the peak with the maximum amplitude in that interval was found and the lost peak was restored.
Finally, peaks classified as S1 and S2 sounds were detected on the basis of the physiological characteristics of the heartbeat. The systolic interval between S1 and S2 sounds is usually shorter than the disastolic interval between S1 and S2 sounds. The longest time interval was discovered between the detected peaks, and the first peak was labelled as S2 and the second as S1. Further peaks were labelled in sequence. An example of individual detection steps is shown in Fig 1.

Reference signals and noise
For testing filtering methods, it was necessary to choose appropriate signals. Unfortunately at the current time there are only three publicly accessible databases: Shiraz University Fetal Heart Sounds Database [22], Fetal PCGs Database available in PhysioBank archive [59], both containing real data. A Simulated Fetal Phonocardiograms Database [5] containing synthetically generated signals with different fetal states (physiological or pathological) and recording conditions. One obstacle in testing algorithms on real recordings is the absence of a reference signal against which the accuracy of filtering methods could be evaluated.
For this reason, we used our own synthetic signals for testing. We generated two reference signals (r01 a r02), to which we added the four most commonly occurring types of disturbances during fPCG recording in real conditions (mHSs, maternal and fetal movement artifacts, white Gaussian noise and ambient noise). In order to best simulate the influence of disturbances on the quality of the signal in real conditions, where multiple types of interference can work simultaneously, we additionally loaded the reference signals with combinations of individual types of disturbance (e.g. mHSs and movement artifacts, Gaussian noise and ambient noise etc.). In total for each signal 15 types of disturbances were tested (four individual types of disturbance and eleven combined types). Signal r01 was loaded with lower levels of disturbance (SNR of signal with mHSs: -0.53 dB, movement artifacts: -0.84 dB, Gaussian noise: -1.20 dB, and ambient noise: -2.25 dB), while signal r02 was loaded with lower levels of disturbance (SNR of signal with mHSs: -1.82 dB, movement artifacts: -2.49 dB, Gaussian  [60] along with all extracted signals that were obtained using the tested algorithms.
Generation of reference signals and individual types of disturbance was inspired by study [4], and can be summarised as follows: • Reference signals-reference signals were modelled using Gaussian modulated sinusoid (detailed information can be found in [4]). Signals with a length of 300 s represented a fetus with a gestational age of 40 weeks, with a sampling frequency of 1000 Hz and average fHR of 140 bpm. The ratio of S1 and S2 sounds was 1.7, central frequency of S1 was 36.89 Hz, central frequency of S2 was 55.18 Hz and S1 and S2 time inter-distance was 140 ms.

PLOS ONE
• mHSs-interference occurs in the frequency band 10-40 Hz and like the reference signals was modelled using Gaussian modulated sinusoid. Average mHR was 70 bpm, the ratio of S1 and S2 sounds was 1.54, central frequency of S1 was 16.93 Hz, central frequency of S2 was 30.44 and S1 and S2 time inter-distance was 331 ms.
• Maternal and fetal movement artifacts-artifacts caused by movement of limbs, head, or change in position of the fetus occurring in the frequency band 0-25 Hz and manifesting as random impulses in fPCG. Artifacts caused by movement of the mother also manifested themselves as random impulses in fPCG, though in a frequency range of 0-100 Hz. Interference was modelled as random pulses with a fixed amplitude lasting 0.5 to 1.5 s.
• White Gaussian noise-this is random interference, which can be caused by womb contractions, maternal breathing artifacts, digestive sounds or quantization noise of the transducers. Interference was modelled as random Gaussian noise with the same power in any band of the same width.

PLOS ONE
• Ambient noise-broadband interference comprising frequencies from 10 Hz, caused by for instance speech, coughing, closing doors etc. Interference was modelled by a fifth order Butterworth high-pass filter with a cut-off frequency of 100 Hz.

Evaluation methods
Objective evaluation of the effectiveness of the methods was carried out by comparing the accuracy of detection of S1 and S2 sounds, calculation of SNR improvement and determination of mean error of heart interval measurement jDT i j.
• Accuracy of S1 and S2 sounds detection-in order to establish the accuracy of fHSs detection first of all true positive (TP) values were established, as correctly detected S1 or S2 sounds, set ±50 ms [3,61] from equivalent S1 or S2 sounds in the reference signal. False positive (FP) values were then set, as incorrectly detected S1 or S2 sounds, and a false negative (FN), as

PLOS ONE
existing, but undetected S1 or S2 sounds. Finally the statistical parameter the accuracy (ACC) [3,61] in percentages (%) was determined: • Signal-to-noise ratio improvement-the parameter was set as the difference between the original SNR value of the disturbed signal (SNR in ) and the SNR value of the filtered signal (SNR out ). The higher the SNR improvement value, the more effective filtering was. The where fPCG ref (m) is the reference signal, fPCG in (m) is the input signal containing interference, fPCG filt (m) is the signal after application of the filtering method and M is the number of samples of the reference signal.
• Mean error of heart interval measurement jDT i j-the parameter determines the mean value of the measurement error |ΔT i |, which was calculated as the absolute value of heart interval Table 6. Statistical evaluation of the SNR improvement (dB).

PLOS ONE
differences |ΔT i | in milliseconds (ms) [62]: where T i filt is value i-of the interval of the filtered signal and T i ref is value i-of the interval of the reference signal.

Algorithms settings
In order to objectively test all filtering methods, it was necessary to find their optimal setting for each type and level of interference. That was achieved with the help of automated algorithm. For each combination of set parameters, the automated algorithm compared the filtered signal with the reference signal and calculated ACC values. The setting (as well as the filtered signal) with the highest ACC value was chosen. The whole process is shown in Fig 3. Table 7. Statistical evaluation of the parameter jDT i j (ms).

PLOS ONE
For S-G filter, FIR filter, AWT and MODWT the optimal parameter settings are summarised in Table 2 and for VMD, EMD, EEMD and CEEMDAN in Table 3. For S-G filter it was necessary to set the length of the window and the polynomial order. For FIR filter it was necessary to choose an appropriate filter system (the BPF type with a frequency band of 20-110 Hz was used). For the AWT and MODWT methods it was necessary to choose an appropriate type of wavelet, wavelet width and number of decomposition levels. The symlet, coiflet and Daubechies wavelets were tested because their shape, energy and frequency spectrum is similar to that of fHSs [3]. For the EEMD a CEEMDAN methods it was necessary to choose the appropriate number of ensemble trials N and the standard deviation of the added noise Nstd. All four methods VMD, EMD, EEMD, and CEEMDAN were based on the principle of

PLOS ONE
decomposition of the input signal into simpler signals-IMFs. The total number of extracted IMFs was dependent on the character of the input signal and extraction of IMFs took place as long as it was not possible to extract further IMFs. This was in the case where the signal was a constant, monotone function or a function with one extreme. For these methods it was therefore necessary to choose an appropriate combination of IMFs, which contributed to the creation of the resulting filtered signal. An example of three IMFs for the VMD, EMD, EEMD, and CEEMDAN methods is shown in Fig 4.

Results
The efficiency of the S-G filter, FIR filter, AWT, MODWT, VMD, EMD, EEMD and CEEM-DAN was evaluated against reference signals. In total 15 types of disturbance were filtered (four individual types of disturbance and eleven combinations) for two recordings r01 and r02. Evaluation of the effectiveness of the methods was carried out by detection of S1 and S2 sounds, calculation of SNR improvement and determination of parameter jDT i j. The best result for the given type of disturbance was highlighted in the table (for detection of S1 and S2 sounds and SNR improvement the highest values were highlighted and for parameter jDT i j the lowest values).

Accuracy of S1 and S2 sounds detection
Evaluation of the accuracy of S1 and S2 sounds detection was carried out by determining the values of TP, FP and FN, and then calculating the ACC parameter. The resulting ACC values for detection of S1 sounds for both recordings r01 and r02 are summarised in Table 4 and the resulting ACC values for detection of S2 sounds for both recordings r01 and r02 are summarised in Table 5.
According to Table 4 all tested algorithms, except VMD, achieved effective extraction and accurate detection of S1 sounds, as the average ACC values exceeded 80%. Based on the average of the ACC values, the most effective algorithm was the CEEMDAN (91.53%), followed by the EEMD method, which also achieved an average ACC value of over 90% (90.16%). The S-G filter, FIR filter, AWT, MODWT and EMD methods can be considered less suitable as their average ACC values did not exceed 90% (88.87%, 83.92%, 88.45%, 87.78%, and 84.92%, respectively). The VMD method reached an average accuracy of 78.84% and can be considered the least effective.
According to Table 5 in the detection of S2 sounds, lower accuracy was generally achieved, as none of the methods reached an average ACC value of over 80%. Based on the average of the ACC values, the most effective algorithm was the CEEMDAN (68.89%), followed by the MODWT (68.75%) and FIR filter (68.48%). The EEMD, EMD and S-G filter with an average accuracy of 66.50%, 63.36%, and 60.01%, respectively, can be considered even less effective. The S2 sounds were significantly suppressed by VMD and AWT, which reached the lowest average ACC values (52.10% and 49.34%, respectively).

Signal-to-Noise ratio improvement
The resulting SNR improvement values are summarised for both recordings r01 and r02 in Table 6. The best results in SNR improvement were achieved with the CEEMDAN method with an average value of 9.75 dB, followed by EEMD with an average value of 8.30 dB. Lower average SNR improvement values were obtained by AWT (7.63 dB) and FIR filter (7.40 dB). These methods reached satisfactory results in some cases, but low in others which caused the average SNR improvement to be lower. For example, AWT achieved the highest SNR improvement in the case of Gaussian noise and ambient noise in both r01 and r02 recordings (14.10 dB, 11.23 dB, 13.96 dB, and 12.27 dB, respectively) or for the combination of Gaussian noise and ambient noise in r01 recording (12.08 dB). But on the other hand, in the case of mHSs in r02 recording, AWT achieved the lowest SNR improvement value (4.59 dB). The situation was similar for the FIR filter, which achieved the highest value of SNR improvement in the case of a combination of mHSs and Gaussian noise in r01 recording (11.91 dB) but the second lowest value in the case of a combination of movement artifacts, Gaussian and ambient noise in r02 recording (3.53 dB) or in the case of the combination of all four types of interference in r02 recording (3.31 dB). The lowest SNR improvement was achieved with VMD, S-G filter, EMD and MODWT (6.78 dB, 6.77 dB, 6.19 dB, and 5.60 dB, respectively).

Mean error of heart interval measurement
The resulting values of the jDT i j parameter are summarised for both recordings r01 and r02 in Table 7. The lowest average jDT i j value and thus the best result was obtained again using the CEEMDAN method with an average value of 3.27 ms, followed by EEMD with an average value of 3.50 ms. Less effective were AWT, MODWT, S-G filter, VMD and EMD, as the average jDT i j values exceeded 4 ms (4.31 ms, 4.40 ms, 4.42 ms, 4.53 ms, and 4.78 ms, respectively).

PLOS ONE
The FIR filter can be considered the least effective, as the average jDT i j value exceeded 5 ms (5.12 ms).

Statistical analysis
To determine whether the differences of the results provided by the individual algorithms are statistically significant, we performed a statistical analysis of the results obtained for all evaluation parameters used (ACC when detecting S1 and S2, SNR improvement and jDT i j). Statistical analysis was performed using R Core Team [63]. In all cases, statistical significance was set at p <0.05.
First, normality of the data was tested for each algorithm and each interference level using the Shapiro-Wilk test. In some cases, statistically significant deviations from normality were detected, and therefore non-parametric methods, median and interquartile range (IQR), were selected to describe the data. Descriptive statistics were performed separately for record r01, which was exposed to lower levels of interference (referred to as low noise level), and separately for record r02, which was subjected to higher levels of interference (referred to as high noise level).
The Kruskal-Wallis test was used to determine statistically significant differences between the compared algorithms in terms of individual evaluation parameters (H 0 : Medians of the evaluation parameter are the same for all algorithms, H A : Difference between at least one pair of medians of the evaluation parameters is statistically significant). If a statistically significant difference between the compared algorithms was detected for the medians of an evaluation parameter, a post hoc analysis was performed using Dunn's test and multiple comparison pvalues were adjusted with the Benjamin-Hochberg method.
For the ACC parameter, a statistically significant difference was found between the compared algorithms in the case of signals affected by low interference levels, both in the detection of S1 sounds and in the detection of S2 sounds (p-value <0.001 in both cases), see Table 8. In the case of S1 sounds detection, the VMD algorithm was identified as the algorithm with low ACC, the difference of the rest of the compared algorithms was not statistically significant in terms of the ACC parameter.
In the case of S2 sounds detection, two homogeneous subgroups of algorithms were identified, i.e. subgroups of algorithms where the difference between medians of the ACC parameter was not statistically significant. The first group consisted of the S-G, FIR, MODWT, EMD, EEMD, and CEEMDAN algorithms; the second homogeneous subgroup consisted of the S-G, AWT, and VMD algorithms. It can be noted that the S-G algorithm can be classified in terms of the ACC parameter both in the subgroup of algorithms with higher ACC and in the subgroup of algorithms with lower ACC.
For signal affected by high levels of interference, no statistically significant difference was observed between the compared algorithms in terms of ACC parameter in the detection of S1 sounds (p-value = 0.355), nor in the detection of S2 sounds (p-value = 0.364), see Table 8. A comparison of algorithms in S1 and S2 detection assessed by the ACC parameter is shown in Figs 5 and 6, respectively.
In the case of the SNR improvement, a statistically significant difference was found between the compared algorithms at low and high interference levels (in both cases p-value <0.001), see Table 9. At a low interference level, three homogeneous subgroups of algorithms were identified. The first two subgroups included algorithms with higher values of SNR improvement; the first subgroup consisted of FIR, AWT, EEMD, and CEEMDAN, the second subgroup consisted of S-G, FIR, AWT, VMD, and EMD. On the other hand, statistically significantly lowest values of SNR improvement were observed with the MODWT algorithm, forming the third subgroup. In the case of a high level of interference, the CEEMDAN algorithm was identified as the algorithm with the statistically significantly highest SNR improvement; no statistically significant differences were observed between the other algorithms.
In the case of the parameter jDT i j, no statistically significant difference was found between the compared algorithms, both for the signals affected by low interference levels (pvalue = 0.692) and those affected with high interference levels (p-value = 0.704), see Table 9. Graphical presentation of the comparison of algorithms in terms of SNR improvement and jDT i j is demonstrated in Figs 7 and 8, respectively.
To verify the effect of the interference level on the ACC parameter, the ACC ratios of the low and high interference levels for all compared algorithms were analyzed for both S1 and S2

PLOS ONE
sounds detection. Statistically significant difference (median ACC ratio less than or greater than one) meant that there was a statistically significant difference between ACC for low noise level and high noise level. The ACC ratio values greater than one thus indicated a higher ACC at low interference levels. Non-parametric methods were again used for descriptive statistics as well as for statistical induction methods. Significance of ACC ratio was tested by two-tailed Wilcoxon signed-rank test (H 0 : The median of ACC ratio is equal to one, H A : The median of ACC ratio is not equal to one). For all compared algorithms, both in the case of S1 sounds detection and in the case of S2 sounds detection, a statistically significant effect of the interference level on the ACC parameter was identified (in all cases p-values � 0.002), see Table 10.

PLOS ONE
Finally, we used the Kruskal-Wallis test to find statistically significant differences between the compared algorithms with respect to the ACC ratio (H 0 : The medians of the ACC ratio are the same for all compared algorithms, H A : The difference of at least one pair of medians is statistically significant). With regard to the ACC ratio, no statistically significant difference was found between the compared algorithms, both in the detection of S1 sounds (p-value = 0.725) and in the detection of S2 sounds (p-value = 0.579), see Table 10. The Figs 9 and 10 provide a comparison of the algorithms using the hybrid boxplots in terms of the ACC ratio for S1 and S2 sounds, respectively.

Discussion
Based on the evaluation of average values of objective parameters in the detection of S1 and S2 sounds, SNR improvement and jDT i j parameter, the best results were achieved using the CEEMDAN method. The EEMD method achieved very promising, although slightly worse results than CEEMDAN for all evaluated parameters. In addition, EEMD was computationally more complex than CEEMDAN. The EMD and VMD methods only achieved satisfactory results according to the parameter jDT i j. In detection of S1 and S2 sounds and in SNR improvement its performance was poor. However, compared to the EEMD and CEEMDAN

PLOS ONE
methods, their computational complexity was significantly lower. The AWT and MODWT methods achieved very promising results in detection of S1 sounds according to the jDT i j parameter. AWT achieved satisfactory results in SNR improvement, but on the other hand it achieved the worst average results of all methods in detection of S2 sounds. MODWT was effective in detection of S2 sounds, however it achieved the worst average results in SNR improvement. The FIR filter achieved satisfactory results in detection of S2 sounds and SNR improvement, however weak results in detection of S1 sounds and the worst results according to the jDT i j parameter. The S-G filter achieved very promising results in detection of S1 sounds and according to the jDT i j parameter, however in detection of S2 sounds and SNR improvement its performance was unsatisfactory. In this section, the difference in extraction accuracy achieved by individual methods will be presented, especially in terms of S1 and S2 sounds detection. Furthermore, the influence of the interference level and the presence of multiple types of disturbance will be shown. An example of extracted signals for recording r02 loaded with individual types of disturbance is shown in Fig 11. It can be seen that all types of interference were sufficiently suppressed with regard to S1 sounds detection and S1 sounds could therefore be accurately detected (all methods in filtering of all four types of interference achieved ACC > 86%). However, during filtering mHSs, S-G filter, AWT, VMD and EMD were unable to effectively eliminate the maternal component, which led to lower accuracy in detection of S2 sounds (ACC < 81%). When filtering movement artifacts, elimination of interference was not sufficient using the S-G filter and AWT, which also led to very low accuracy in detection of S2 sounds (ACC < 60%). In the case of Gaussian noise, the AWT method effectively suppressed the interference, but in addition, the S2 sounds were also suppressed and their detection was therefore not successful (ACC = 60.81%). Detection of S2 sounds was also unsuccessful when using the VMD method, as interference was not sufficiently suppressed and S2 sounds were not correctly detected (ACC = 55.39%). When filtering ambient noise, the AWT and MODWT methods suppressed

PLOS ONE
interference as well as S2 sounds, which led to low accuracy in their detection (ACC = 54.10% and 77.12%, respectively). On the other hand, the VMD and EMD methods were unable to sufficiently suppress interference and detection of S2 sounds was also inaccurate (ACC = 60.09% and 41.53%, respectively).
The results of the study also showed the effect of level of interference on the resulting quality of the extracted signals, see Fig 12. Results for recording r01, which was loaded with a lower level of interference (SNR of signal with mHSs: -0.53 dB, movement artifacts: -0.84 dB, Gaussian noise: -1.20 dB, and ambient noise: -2.25 dB) were compared with recording r02, which was loaded with a higher level of interference (SNR of signal with mHSs: -1.82 dB, movement

PLOS ONE
artifacts: -2.49 dB, Gaussian noise: -3.56 dB, and ambient noise: -5.74 dB). When filtering mHSs in recording r01 the maternal component was completely eliminated (in detection of S1 and S2 sounds ACC = 100%), however in the case of recording r02 residues of the maternal component remained in the signal. Although it did not significantly affect the accuracy of detection of S1 sounds (ACC = 99.13%), it decreased the accuracy of detection of S2 sounds (ACC = 92.89%). This was similar when filtering movement artifacts. In the case of recording r01 accurate detection of S1 (ACC = 100%) and S2 sounds (ACC = 98.41%) was achieved. But in the case of recording r02, the insufficient elimination of interference led to a fall in accuracy in detection of S1 sounds (ACC = 95.43%) and inaccurate detection of S2 sounds (ACC = 78.28%). When suppressing Gaussian noise, interference was effectively filtered in both recordings r01 and r02 and detection of S1 sounds was accurate (in both recordings ACC = 100%). However, residues of interference led to slightly worse extraction in recording r02 and lower accuracy in S2 sounds detection (ACC = 94.83%) compared to recording r01 (ACC = 99.27%). In the case of ambient noise, the level of interference also significantly influenced the resulting quality of the extracted signal. In recording r01 accurate detection of S1 (ACC = 100%) and S2 sounds (ACC = 98.7%) was achieved, but in the case of recording r02, interference was not filtered out, which led to less accurate detection of S1 (ACC = 97.69%) and S2 sounds (ACC = 84.58%).
As well as the level of interference, the presence of multiple types of disturbance influenced the overall extraction quality, see an example for recording r02 in Fig 13. If only mHSs were present in the signal, the interference was eliminated and accurate detection of S1 (ACC = 99.13%) and S2 sounds (ACC = 92.89%) was achieved. If movement artifacts were

PLOS ONE
added to mHSs, residues of interference led to a slightly lower accuracy of S1 sounds detection (ACC = 95.53%), but a significantly lower accuracy of S2 sounds detection (ACC = 52.34%). When adding further interference in the form of Gaussian noise, the interference was not sufficiently suppressed. This led to significantly lower values of S1 (ACC = 80.23%) and S2 sounds (ACC = 34.64%) detection. The worst results in detection of S1 (ACC = 55.02%) and S2 sounds (ACC = 29.3%) was achieved when adding ambient noise, and therefore loading the signal with all four types of interference.

Summary and future directions
This study focused on the comparison of eight single-channel both conventional (S-G filter, FIR filter) and advanced (AWT, MODWT, VMD, EMD, EEMD, CEEMDAN) signal-processing algorithms. The use of a relatively large number of algorithms and objective evaluation parameters (accuracy of S1 and S2 sounds detection, SNR improvement and jDT i j parameter) can be considered as an advantage of this study. In particular, the evaluation of the accuracy of S2 detection is not very common in the field of fPCG (except for very few publications, e.g. [16][17][18]), although this information is useful for clinical practice. The benefit of the study is also to test the performance of algorithms in many scenarios, such as different types and levels of interference. Overall, the methods were tested on signals loaded with 30 levels of interference (SNR values from -0.53 dB to -10.76 dB), including the most common types of interference (mHSs, movement artifacts, Gaussian noise, ambient noise) and their combinations. In particular, testing on signals affected with more than one type of interference is valuable as it reflects situations that are very likely to occur when measuring in real conditions. In addition, to our best knowledge, some algorithms (MODWT or CEEMDAN) have not yet been tested and published at all for the fPCG extraction. Conversely, performing experiments solely on synthetic signals can be considered a limitation of this study since tests on real signals can show slightly different results. In addition, testing was performed only on signals corresponding to the 40th week of pregnancy. As fPCG signals change throughout pregnancy, especially in terms of useful signal amplitude, further testing of algorithms on fPCG signals corresponding to other gestational ages is necessary. Another disadvantage may be offline testing, which may not fully address the problems associated with online implementation. This is associated mainly with the need to optimize algorithms in real time or to process the input signals piece-by-piece (as opposed to having the entire input signal available).
The results of statistical analysis presented herein showed no statistically significant difference between performance of the individual algorithms in terms of the parameter jDT i j at signals with low as well as high interference levels. For the ACC parameter, assessing the ability to detect S1 and S2, this applied at signals loaded by high interference levels. Contrary, when the signal was loaded with low levels of interference, a statistically significant difference was identified between the algorithms for the ACC parameter. A statistically significant difference between the algorithms was also found in the case of the SNR improvement parameter when the signal was loaded with both low and high levels of interference. Furthermore, for all compared algorithms, a statistically significant effect of the interference level on the ACC parameter was identified in the case of both S1 and S2 sounds detection. However, with respect to the ACC ratio (low noise level/high noise level), no statistically significant difference was found between the compared algorithms, both in the detection of S1 and S2.
Based on the evaluation of average values of objective parameters, CEEMDAN proved to be the most effective method for detecting S1 and S2 sounds with average accuracy of ACC = 91.53% in the detection of S1 and ACC = 68.89% when S2 is detected. In addition, CEEMDAN also outperformed the other tested methods in terms of improving SNR and the jDT i j parameter. Compared to EEMD, CEEMDAN was computationally faster and allowed implementation in real-time operating device. The benefits of the CEEMDAN algorithm can be summarized as follows: • Single-channel approach-channel approach-provides higher comfort and mobility for the pregnant woman.
• High quality extraction-even for signals with relatively noisy signals.
• High accuracy in detecting S1-ensures the ability to determine fHR accurately.
• Low computational complexity-enables implementation in real-time operating devices.
On the other hand, accurate detection of S2 sounds proved to be difficult for all algorithms, including CEEMDAN. This was probably due to the lower magnitude of S2 compared to S1. As a result, S2 sounds were less distinct from noise and their subsequent extraction and detection was inaccurate. Therefore, future research should be focused on the refinement of S2 detection.
It would also be beneficial for clinical practice to detect and classify pathological heart murmurs that can help detect congenital heart defects. Algorithms based on artificial intelligence and machine learning could be used for classification of fetal pathological conditions. However, very few authors have dealt with the use of artificial intelligence and machine learning in the field of fPCG. This may be because these methods require a large amount of physiological and pathological data for both training and testing, but these data are not available in the field of fPCG. For these reasons, our further research will focus on creating a large dataset containing both real pathological and physiological fPCG records. The dataset will include information on fetal gestational age, sensor placement, maternal and fetal health, and reference annotations with fetal and maternal HSs locations will be created so that the efficiency of extraction algorithms can be objectively evaluated. Thus, other filtering methods will be further tested in the future, including multi-channel algorithms or hybrid methods combining multiple algorithms to achieve more accurate extraction.

Conclusion
In this study eight algorithms were compared (S-G filter, FIR filter, AWT, MODWT, VMD, EMD, EEMD, and CEEMDAN) for fPCG extraction to eliminate mHSs, movement artifacts, Gaussian noise, ambient noise and eleven combinations of these disturbances. Testing was carried out on two synthetic recordings r01 and r02, where recording r02 was loaded with higher levels of interference than recording r01. The evaluation was performed by the assessment of the accuracy of S1 and S2 sounds detection, SNR improvement and jDT i j parameter. In all parameters the best results were achieved by the CEEMDAN method. Very promising results were also achieved using the EEMD method, however compared to CEEMDAN, EEMD was computationally more complex. It was shown that when loading an input signal with a higher level of interference or multiple types of disturbance, the quality of extraction was worsened and important clinical information was lost. When recording fPCG it is therefore necessary to ensure optimal conditions, particularly appropriate placing of the sensor and eliminating interference, which could unnecessarily contaminate a useful signal. Future research will focus on testing the CEEMDAN method on real physiological and pathological recordings and on creating our own database with real recordings which will be provided to the scientific community for testing extraction algorithms. Furthermore, other algorithms will be tested, including multichannel algorithms or hybrid methods combining multiple algorithms to increase extraction efficiency.